Py2f parallel #457

abishekg7 · 2024-05-03T09:32:15Z

No description provided.

…iffusion test

…o py2f_parallel

halungge · 2024-06-13T07:38:43Z

model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py

+from icon4py.model.common.settings import device
+
+
+#try:


this removes ghex and mpi4py being optional dependencies, that is you can run icon4py even if you don't have those library installed. We should keep that feature imho.

halungge · 2024-06-13T07:42:05Z

model/common/src/icon4py/model/common/dimension.py

@@ -20,6 +20,13 @@
 EdgeDim = Dimension("Edge")
 CellDim = Dimension("Cell")
 VertexDim = Dimension("Vertex")
+SingletonDim = Dimension("Singleton")


what do you need those for? Could you move them it to py2fgen? Whatever has something to do with only the interfacing to fortran should go to tools/py2fgen and not bloat up the model.

halungge · 2024-06-13T08:02:35Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/diffusion.py

-
+        vertex_end_halo = self.grid.get_end_index(VertexDim, HorizontalMarkerIndex.halo(VertexDim))
+
+        loc_rank = self._exchange.my_rank()


either delete the log statements, or keep them but not commented out. You can switch it off globally.

halungge · 2024-06-13T08:12:32Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/helpers.py

+
+
+@dataclasses.dataclass
+class CachedProgram:


isnt this a duplication from common/caching.py? it has nothing to do in particular with diffusion so it should not be here...

halungge · 2024-06-13T09:14:20Z

model/common/src/icon4py/model/common/grid/horizontal.py

@@ -163,13 +163,6 @@ def end(cls, dim: Dimension) -> int:
        return cls._end[dim]


-@dataclass(frozen=True)


thanks for cleaning up...

halungge · 2024-06-13T09:38:29Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

    )
+    #processor_props = get_multinode_properties(MultiNodeRun(), comm_id)
+    #exchange = definitions.create_exchange(processor_props, decomposition_info)


delete commented out code or uncomment if they remain useful. You can also extract them into a function such that they pollute the code less here. You can always fix whether they are called or not throught the log level. If you run with anything above DEBUG the will not get triggered.

halungge · 2024-06-13T09:40:49Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

 ):
-    logger.info(f"Using Device = {device}")
+    log.info(f"Using Device = {device}")

    # ICON grid
    if device.name == "GPU":


you could shorten this by

Suggested change

if device.name == "GPU":

on_gpu = (device.name == "GPU")

or at least with an if expression if you think that is cryptic.

on_gpu = True if device.name == "GPU" else False

halungge · 2024-06-13T10:04:18Z

model/common/src/icon4py/model/common/test_utils/grid_utils.py

+    cells_end_index,
+    vertex_start_index,
+    vertex_end_index,
+    edge_start_index,


add type annotations.

halungge · 2024-06-13T10:09:32Z

model/common/src/icon4py/model/common/test_utils/grid_utils.py

@@ -77,6 +97,88 @@ def load_grid_from_file(
    return gm.get_grid()


+def construct_icon_grid(


if you only use this interface (passing all arrays in one function) for the grid construction in the wrapper maybe you should also move it there? For the python code I think its nicer to use the builder pattern directly.

For sure I think this should not be in test_utils. For practical reasons test_utils is in the production code, but as the name says it is meant for testing. So I don't know whether you use any test infrastructure in the Fortran production code.

halungge · 2024-06-13T10:18:08Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/diffusion.py

@@ -370,6 +370,9 @@ def __init__(self, exchange: ExchangeRuntime = SingleNodeExchange()):
        self.cell_params: Optional[CellParams] = None
        self._horizontal_start_index_w_diffusion: int32 = 0

+    def set_exchange(self, exchange):


Do you really need to set this and make it mutable?

In general, we should discuss if there is a way that your granule interfacing works with having the distinction between the initand __init__ in the python granule.

…allel

samkellerhals · 2024-06-18T08:25:11Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/diffusion.py

@@ -297,7 +297,7 @@ def __post_init__(self, config):
        object.__setattr__(
            self,
            "scaled_nudge_max_coeff",
-            config.nudge_max_coeff * DEFAULT_PHYSICS_DYNAMICS_TIMESTEP_RATIO,


Can you add a comment here saying that ICON already scales this by 5, and that therefore it is the responsibility of the user to set nudge_max_coeff accordingly

samkellerhals · 2024-06-18T08:31:13Z

model/common/src/icon4py/model/common/test_utils/grid_utils.py

+
+
+def fortran_grid_indices_to_numpy_offset(inp) -> np.ndarray:
+    #return np.subtract(xp.asnumpy(inp.ndarray, order="F").copy(order="F"), 1)


Remove dead code

samkellerhals · 2024-06-18T08:31:29Z

model/common/src/icon4py/model/common/test_utils/grid_utils.py

+
+
+def fortran_grid_indices_to_numpy(inp) -> np.ndarray:
+    #return xp.asnumpy(inp.ndarray, order="F").copy(order="F")


Remove dead code

samkellerhals · 2024-06-18T08:32:45Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

+    cells_start_index_np = fortran_grid_indices_to_numpy_offset(cells_start_index)
+    vert_start_index_np = fortran_grid_indices_to_numpy_offset(vert_start_index)
+    edge_start_index_np = fortran_grid_indices_to_numpy_offset(edge_start_index)
+
+    cells_end_index_np = fortran_grid_indices_to_numpy(cells_end_index)
+    vert_end_index_np = fortran_grid_indices_to_numpy(vert_end_index)
+    edge_end_index_np = fortran_grid_indices_to_numpy(edge_end_index)
+
+    c_glb_index_np = fortran_grid_indices_to_numpy_offset(c_glb_index)
+    e_glb_index_np = fortran_grid_indices_to_numpy_offset(e_glb_index)
+    v_glb_index_np = fortran_grid_indices_to_numpy_offset(v_glb_index)
+
+    c_owner_mask_np = c_owner_mask.ndarray.copy(order="F")[0:num_cells]
+    e_owner_mask_np = e_owner_mask.ndarray.copy(order="F")[0:num_edges]
+    v_owner_mask_np = v_owner_mask.ndarray.copy(order="F")[0:num_verts]
+
+    c2e_loc = fortran_grid_connectivities_to_xp_offset(c2e)
+    c2e2c_loc = fortran_grid_connectivities_to_xp_offset(c2e2c)
+    v2e_loc = fortran_grid_connectivities_to_xp_offset(v2e)
+    e2c2v_loc = fortran_grid_connectivities_to_xp_offset(e2c2v)
+    e2c_loc = fortran_grid_connectivities_to_xp_offset(e2c)


as discussed it would be cleaner if you could delegate this to a more generic function or call it in a loop so you don't have to have multiple statements calling the same function over and over again.

samkellerhals · 2024-06-18T08:33:03Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

+    vert_start_index_np = fortran_grid_indices_to_numpy_offset(vert_start_index)
+    edge_start_index_np = fortran_grid_indices_to_numpy_offset(edge_start_index)
+
+    cells_end_index_np = fortran_grid_indices_to_numpy(cells_end_index)


Please use more clear naming for these functions

samkellerhals · 2024-06-18T08:34:55Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

-        on_gpu=on_gpu,
-        limited_area=True if limited_area else False,
+    decomposition_info = (
+        DecompositionInfo(klevels=num_levels)


I would try to avoid using array variable names with np suffixes unless strictly necessary. It would be nicer to try and keep the code as generic as possible (supporting both cupy and numpy arrays). If conversion is necessary somewhere do it where it is needed.

samkellerhals · 2024-06-18T08:36:03Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

+    processor_props = get_multinode_properties(MultiNodeRun(), comm_id)
+    exchange = definitions.create_exchange(processor_props, decomposition_info)
+
+    # log.debug("icon_grid:cell_start%s", icon_grid.start_indices[CellDim])


remove dead code, or if these debug statements are useful I would pack them in a separate function.

Also if you decide for the separate function see if you can make the debug statements less repetitive, e.g. using for loops.

samkellerhals · 2024-06-18T08:38:32Z

tools/src/icon4pytools/py2fgen/wrappers/diffusion.py

@@ -223,6 +360,9 @@ def diffusion_init(
        geofac_grg_y=geofac_grg_y,
        nudgecoeff_e=nudgecoeff_e,
    )
+    global diffusion_granule


Since using globals is bad practice in Python please put a comment here explaining why we are doing this.

github-actions · 2024-06-20T15:15:04Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default
launch jenkins spack

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

halungge · 2024-06-26T07:48:06Z

model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py

@@ -50,6 +52,7 @@
 if TYPE_CHECKING:
    import mpi4py.MPI

+ghex_arch = Architecture.GPU if device.name == "GPU" else Architecture.CPU


settings.device is an enum. Why not directly test for the enum value instead of using the device.name? If you want to compare with a string, you could turn the settings.py::Device into a string enum

class Device(str, Enum) ...

or

class Device(StrEnum):

halungge · 2024-06-26T07:49:52Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/diffusion.py

-            prognostic_state.w,
-            prognostic_state.theta_v,
-            prognostic_state.exner,
+            prognostic_state.w.ndarray[0 : self.grid.num_cells, :],


Did you try to get rid of this slices and it did not work. Do you know why or what did not work? We should try to figure this out and handle it differently it is very error prone like this.

If the explicit bounds are needed we could try to push it inside GHexMultiNodeExchange.exchange . What you use here is the "real" num_cells, not nproma, right?

halungge and others added 30 commits February 1, 2024 13:04

add simple exchange test in test_mpi_decomposition.py, fix parallel d…

6deda18

…iffusion test

add calculation of mean_cell_area

5533ffa

Merge branch 'main' into add_dummy_exchange_test

90d66a4

WIP

a5fc37a

read grid_root and grid_level from experiment name

a4addc2

retract changes in parallel dycore test.

3d92f8a

formatting

abededa

Merge branch 'main' into fix_parallel_tests

2508d58

merge main

cb7d497

merge main

58a8d50

update 2 node data url

e08764f

fix tests

6d4c5b9

fix 2 node data url

d53386f

Merge branch 'main' into fix_parallel_tests

6576fa6

merge main

53be03b

pre-commit

a56b482

fix mch gridfile name for test

8e74d93

merge main

d6418d8

Merge branch 'main' into fix_parallel_tests

fdd89e9

Merge branch 'main' into fix_parallel_tests

1e75d55

updated datasets

378c583

Merge branch 'main' into fix_parallel_tests

9776040

rename property in icon_grid

474ed4d

add docstring for parsing for root and levelfrom name

d705d5f

cleanup

6d44838

Merge branch 'main' into fix_parallel_tests

dad2137

remove version restrictions for ghex and mpi4py, fix ghex lib name

0fa8bd0

fix usage of fixtures

d2f5ec3

adapt to new ghex interface

f5293c8

Add cProfile to wrapper

6284fab

abishekg7 added 4 commits May 17, 2024 15:44

fixes

e4cb5df

Merge branch 'py2f-with-optimisations' of github.com:C2SM/icon4py int…

29ac395

…o py2f_parallel

precommit

b090ebb

temp fix

567f58d

samkellerhals force-pushed the py2f-with-optimisations branch from b4fa242 to b4211d8 Compare May 27, 2024 12:34

Base automatically changed from py2f-with-optimisations to main May 29, 2024 13:39

abishekg7 added 5 commits May 30, 2024 12:52

Merge branch 'main' of github.com:C2SM/icon4py into py2f_parallel

382c82d

comment try

f300247

replacing True with limited_area

1f92f19

test

7f3c3d8

temp cpu changes

629f020

halungge requested changes Jun 13, 2024

View reviewed changes

abishekg7 added 5 commits June 14, 2024 16:30

some fixes

2785040

some cleanup

da1ab2c

cleanup

6a40ff0

more tests with cupy indices

e6be7ba

Merge branch 'py2f_parallel' of github.com:C2SM/icon4py into py2f_par…

e744c4d

…allel

samkellerhals requested changes Jun 18, 2024

View reviewed changes

abishekg7 added 10 commits June 18, 2024 14:29

changing indices to xp

e54e2ab

Adding xp.asnumpy

196ae94

using gt4py's asnumpy

14b187f

addressing review

94f9c30

refactoring utils

a375271

concat two dicts in build_array_size_args

a0df554

concat two dicts in build_array_size_args

057b79d

fix

546247b

cleanup and address review comments

45b1d44

refactoring to allow both serial and parallel runs

cdb51fe

halungge reviewed Jun 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Py2f parallel #457

Py2f parallel #457

abishekg7 commented May 3, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

halungge Jun 13, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024

samkellerhals Jun 18, 2024 •

edited

Loading

github-actions bot commented Jun 20, 2024

halungge Jun 26, 2024

halungge Jun 26, 2024

halungge Jun 26, 2024


		vertex_end_halo = self.grid.get_end_index(VertexDim, HorizontalMarkerIndex.halo(VertexDim))

		loc_rank = self._exchange.my_rank()

		@@ -163,13 +163,6 @@ def end(cls, dim: Dimension) -> int:
		return cls._end[dim]


		@dataclass(frozen=True)

		@@ -77,6 +97,88 @@ def load_grid_from_file(
		return gm.get_grid()


		def construct_icon_grid(



		def fortran_grid_indices_to_numpy_offset(inp) -> np.ndarray:
		#return np.subtract(xp.asnumpy(inp.ndarray, order="F").copy(order="F"), 1)



		def fortran_grid_indices_to_numpy(inp) -> np.ndarray:
		#return xp.asnumpy(inp.ndarray, order="F").copy(order="F")



		@dataclasses.dataclass
		class CachedProgram:

Py2f parallel #457

Are you sure you want to change the base?

Py2f parallel #457

Conversation

abishekg7 commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samkellerhals Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jun 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samkellerhals Jun 18, 2024 •

edited

Loading